I realize there's a Zipline Google Group, and also posts about computing zipline backtests in parallel, but this question is different: How does quantopian quickly load large data values? When running minute backtests over several years with hundreds or thousands of symbols, data grows large, causing 2 problems I'm trying to solve (and believe Quantopian already solved):
- How is this data quickly loaded when the backtest starts? Databases seem slow as this data can be in the GB size. Is data memcached? How is it broken up if so?
- How is this data shared between backtests? Quantopian is essentially letting many backtests happen in parallel. While they use different algos, they borrow from the same data. Does each algo instance really get it's own copy of historical
datain memory, or is there something like memcache used for each algo to pull its day or minute data from memcache as needed?
- How is this data shared between backtests? Quantopian is essentially letting many backtests happen in parallel. While they use different algos, they borrow from the same data. Does each algo instance really get it's own copy of historical
In a zipline scenario, if i want to test the same algo over the same time range with the same data, I could save a lot of time in the parallel tests if they could share the data object in memory. Even if i memcache historical data then load it into a Pandas Panel for each test, that creates a complete copy of static data for each parallel test (ie one copy per CPU core plus what's already in memcache!). Seems a waste of RAM.